Lower and higher estimates of the number of "true analogies" between sentences contained in a large multilingual corpus
نویسنده
چکیده
The reality of analogies between words is refuted by noone (e.g., I walked is to to walk as I laughed is to to laugh, noted I walked : to walk :: I laughed : to laugh). But computational linguists seem to be quite dubious about analogies between sentences: they would not be enough numerous to be of any use. We report experiments conducted on a multilingual corpus to estimate the number of analogies among the sentences that it contains. We give two estimates, a lower one and a higher one. As an analogy must be valid on the level of form as well as on the level of meaning, we relied on the idea that translation should preserve meaning to test for similar meanings.
منابع مشابه
Lower and higher estimates of 'true analogies' between sentences contained in a large multilingual corpus
The reality of analogies between words is refuted by noone (e.g., I walked is to to walk as I laughed is to to laugh, noted I walked : to walk :: I laughed : to laugh). But computational linguists seem to be quite dubious about analogies between sentences: they would not be enough numerous to be of any use. We report experiments conducted on a multilingual corpus to estimate the number of analo...
متن کاملAnalogies of form between chunks in Japanese are massive and far from being misleading
This paper relates to the assessment of the argument of the poverty of the stimulus in that we measure the number of true proportional analogies between chunks in a language with case markers, Japanese. On a bicorpus of 20,000 sentences, we show that at least 96% of the analogies of form between chunks are also analogies of meaning, thus reporting the presence of at least two million true analo...
متن کاملA corpus study on the number of true proportional analogies between chunks in two typologically different languages
We measure the number of true proportional analogies between chunks in two typologically different languages on a similar corpus: a 20,000 sentence long Japanese-English bicorpus. We verify that at least 96% of analogies of form between chunks are also analogies of meaning. We conclude that analogy ought to be considered as a reliable structuring device between chunks.
متن کاملThe Relationship between the Interpersonal Intelligence and Reading Comprehension Achievement of Iranian Bilingual and Multilingual EFL Learners
The present study aimed at investigating the relationship between interpersonal intelligence of Iranian bilingual and multilingual EFL learners and their reading comprehension achievement. To do so, 60 intermediate EFL students were selected from a group of 80 based on their OPT scores. They were non-randomly divided into two experimental groups. Data collection took place during the summer sem...
متن کاملDisentangling from Babylonian Confusion - Unsupervised Language Identification
This work presents an unsupervised solution to language identification. The method sorts multilingual text corpora on the basis of sentences into the different languages that are contained and makes no assumptions on the number or size of the monolingual fractions. Evaluation on 7-lingual corpora and bilingual corpora show that the quality of classification is comparable to supervised approache...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004